Distributional Memory: A General Framework for Corpus-Based Semantics

نویسندگان

  • Marco Baroni
  • Alessandro Lenci
چکیده

Research into corpus-based semantics has focused on the development of ad hoc models that treat single tasks, or sets of closely related tasks, as unrelated challenges to be tackled by extracting different kinds of distributional information from the corpus. As an alternative to this “one task, one model” approach, the Distributional Memory framework extracts distributional information once and for all from the corpus, in the form of a set of weighted word-link-word tuples arranged into a third-order tensor. Different matrices are then generated from the tensor, and their rows and columns constitute natural spaces to deal with different semantic problems. In this way, the same distributional information can be shared across tasks such as modeling word similarity judgments, discovering synonyms, concept categorization, predicting selectional preferences of verbs, solving analogy problems, classifying relations between word pairs, harvesting qualia structures with patterns or example pairs, predicting the typical properties of concepts, and classifying verbs into alternation classes. Extensive empirical testing in all these domains shows that a Distributional Memory implementation performs competitively against task-specific algorithms recently reported in the literature for the same tasks, and against our implementations of several state-of-the-art methods. The Distributional Memory approach is thus shown to be tenable despite the constraints imposed by its multi-purpose nature.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A distributional memory for German

This paper describes the creation of a Distributional Memory (Baroni and Lenci 2010) resource for German. Distributional Memory is a generalized distributional resource for lexical semantics that does not have to commit to a particular vector space at the time of creation. We induce a resource from a German corpus, following the original design decisions as closely as possible, and discuss the ...

متن کامل

One Distributional Memory, Many Semantic Spaces

We propose an approach to corpus-based semantics, inspired by cognitive science, in which different semantic tasks are tackled using the same underlying repository of distributional information, collected once and for all from the source corpus. Task-specific semantic spaces are then built on demand from the repository. A straightforward implementation of our proposal achieves state-of-the-art ...

متن کامل

Compositional distributional semantics with compact closed categories and Frobenius algebras

The provision of compositionality in distributional models of meaning, where a word is represented as a vector of co-occurrence counts with every other word in the vocabulary, offers a solution to the fact that no text corpus, regardless of its size, is capable of providing reliable co-occurrence statistics for anything but very short text constituents. The purpose of a compositional distributi...

متن کامل

Functional Distributional Semantics

Vector space models have become popular in distributional semantics, despite the challenges they face in capturing various semantic phenomena. We propose a novel probabilistic framework which draws on both formal semantics and recent advances in machine learning. In particular, we separate predicates from the entities they refer to, allowing us to perform Bayesian inference based on logical for...

متن کامل

Textual Similarities Based on a Distributional Approach

The design of efficient textual similarities is an important issue in the domain of textual data exploration. Textual similarities are for example central in document collection structuring (e.g. clustering), or in Information Retrieval (IR) which relies on the computation of textual similarities for measuring the adequacy between a query and documents. The objective of this paper is to present...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Linguistics

دوره 36  شماره 

صفحات  -

تاریخ انتشار 2010